3 research outputs found
Are We There Yet?: The Development of a Corpus Annotated for Social Acts in Multilingual Online Discourse
We present the AAWD and AACD corpora, a collection of discussions drawn from Wikipedia talk pages and small group IRC discussions in English, Russian and Mandarin. Our datasets are annotated with labels capturing two kinds of social acts: alignment moves and authority claims. We describe these social acts, describe our annotation process, highlight challenges we encountered and strategies we employed during annotation, and present some analyses of resulting data set which illustrate the utility of our corpus and identify interactions among social acts and between participant status and social acts and in online discourse
Markers of contrast in Russian: A corpus-based study
Thesis (Master's)--University of Washington, 2013Markers of contrast in Russian: A corpus-based stud
AGGREGATION
This archive is associated with the AGGREGATION project, which seeks to automatically generate HPSG grammars on the basis of Interlinnear Glossed Text data. For a detailed description of this project see Chapter 3 of Inferring Grammars from Interlinear Glossed Text: Extracting Typological and Lexical Properties for the Automatic Generation of HPSG Grammars, PhD thesis by Kristen Howell 2020.
This archive includes the following:
The AGGREGATION/BASIL syntactic inference repository from https://git.ling.washington.edu/agg/aggregation
The MOM morphological inference repository from https://git.ling.washington.edu/agg/mom
The Xigt framework for eXtensible Interlinear Glossed Text release 1.1 from https://github.com/xigt/xigt
The Grammar Matrix Customization system http://matrix.ling.washington.edu/index.html
Code, dependencies and sample data for running the AGGREGATION pipeline end to end.The AGGREGATION Project aims to bring the benefits of grammar engineering to language documentation without requiring field linguists to become grammar engineers. We achieve this by automatically creating precision grammars on the basis of analyses and annotations already produced by field linguists together with a typologically-grounded cross-linguistic grammar resource (the LinGO Grammar Matrix) and natural language processing techniques developed for high-resource languages.
Precision grammars are machine-readable encodings of mutually-consistent linguistic hypotheses, in our case, concerning morphotactics, morphosyntax and the syntax-semantics interface. They can be used to automatically process text, assigning structures to input strings and strings to input semantic representations. Text processed in this way can then be searched for sentences or word forms with structures of interest or items that are not covered by the grammar (i.e. fall outside current hypotheses).National Science Foundation under Grant No. BCS-1160274 (PI Bender)
National Science Foundation under Grant No. BCS-1561833 (PI Bender